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Abstract 

We initiate a systematic study of tolerant testers of image properties or, equivalently, algo¬ 
rithms that approximate the distance from a given image to the desired property (that is, the 
smallest fraction of pixels that need to change in the image to ensure that the image satisfies 
the desired property). Image processing is a particularly compelling area of applications for 
sublinear-time algorithms and, specifically, property testing. However, for testing algorithms to 
reach their full potential in image processing, they have to be tolerant, which allows them to be 
resilient to noise. Prior to this work, only one tolerant testing algorithm for an image property 
(image partitioning) has been published. 

We design efficient approximation algorithms for the following fundamental questions: What 
fraction of pixels have to be changed in an image so that it becomes a half-plane? a representa¬ 
tion of a convex object? a representation of a connected object? More precisely, our algorithms 
approximate the distance to three basic properties (being a half-plane, convexity, and connect¬ 
edness) within a small additive error e, after reading a number of pixels polynomial in 1/e and 
independent of the size of the image. The running time of the testers for half-plane and convex¬ 
ity is also polynomial in 1/e. Tolerant testers for these three properties were not investigated 
previously. For convexity and connectedness, even the existence of distance approximation al¬ 
gorithms with query complexity independent of the input size is not implied by previous work. 

(It does not follow from the VC-dimension bounds, since VC dimension of convexity and con¬ 
nectedness, even in two dimensions, depends on the input size. It also does not follow from the 
existence of non-tolerant testers.) 

Our algorithms require very simple access to the input: uniform random samples for the half¬ 
plane property and convexity, and samples from uniformly random blocks for connectedness. 
However, the analysis of the algorithms, especially for convexity, requires many geometric and 
combinatorial insights. For example, in the analysis of the algorithm for convexity, we define a 
set of reference polygons P e such that (1) every convex image has a nearby polygon in P e and 
(2) one can use dynamic programming to quickly compute the smallest empirical distance to a 
polygon in P e . This construction might be of independent interest. 
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1 Introduction 


Image processing is a particularly compelling area of applications for sublinear-time algorithms and, 
specifically, property testing. Images are huge objects, and our visual system manages to process 
them very quickly without examining every part of the image. Moreover, many applications in 
image analysis have to process a large number of images online, looking for an image that satisfies 
a certain property among images that are generally very far from satisfying it. Or, alternatively, 
they look for a subimage satisfying a certain property in a large image (e.g., a face in an image where 
most regions are part of the background.) There is a growing number of proposed rejection-based 
algorithms that employ a quick test that is likely to reject a large number of unsuitable images 
(see, e.g., citations in [16]). 

Property testing [23, 12] is a formal study of fast algorithms that accept objects with a given 
property and reject objects that are far. Testing image properties in this framework was first 
considered in [21]. Ron and Tsur [22] initiated property testing of images with a different input 
representation, suitable for testing properties of sparse images. Since these models were proposed, 
several sublinear-time algorithms for visual properties were implemented and used: namely, those 
by Kleiner et al. and Korman et al. [16, 17, 18]. 

However, for sublinear-time algorithms to reach their full potential in image processing, they 
have to be resilient to noise: images are often noisy, and it is undesirable to reject images that differ 
only on a small fraction of pixels from an image satisfying the desired property. Tolerant testing 
was introduced by Parnas, Ron and Rubinfeld [19] exactly with this goal in mind—to deal with 
noisy objects. It builds on the property testing model and calls for algorithms that accept objects 
that are close to having a desired property and reject objects that are far. Another related task 
is approximating distance of a given object to a nearest object with the property within additive 
error e. (Distance approximation algorithms imply tolerant testers in a straightforward way: see 
the remark after Definition 2.2). The only image problem for which tolerant testers were studied 
is the image partitioning problem investigated by Kleiner et al. [16]. 

Our results. We design efficient approximation algorithms for the following fundamental 
questions: What fraction of pixels have to be changed in an image so that it becomes a half¬ 
plane? a representation of a convex object? a representation of a connected object? In other 
words, we design algorithms that approximate the distance to being a half-plane, convexity and 
connectedness within a small additive error or, equivalently, tolerant testers for these properties. 
These problems were not investigated previously in the tolerant testing framework. For all three 
properties, we give e-additive distance approximation algorithms that run in constant time (i.e., 
dependent only on e, but not the size of the image). We remark that even though it was known that 
these properties can be tested in constant time [21], this fact does not necessarily imply constant- 
query tolerant testers for these properties. E.g., Fischer and Fortnow [10] exhibit a property (of 
objects representable with strings of length n) which is testable with a constant number of queries, 
but for which every tolerant tester requires queries. For convexity and connectedness, even 

the existence of distance approximation algorithms with query (or time) complexity independent 
of the input size does not follow from previous work. It does not follow from the VC-dimension 
bounds, since VC dimension of convexity and connectedness, even in two dimensions, depends on 
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Property 

Sample Complexity 

Run Time 

Access to Input 

Half-plane 

Oft, log!) 

°(^i°gi) 

uniformly random pixels 

Convexity 

0(4 log!) 

0 (?) 

uniformly random pixels 

Connectedness 

0(4) 

exp(OQ)) 

uniformly random blocks of pixels 


Table 1: Our results on distance approximation. To get complexity of (ei, C 2 )-tolerant testing, 
substitute e = (e 2 — ei)/2. 


the input size 1 . Implications of the VC dimension bound on convexity are further discussed below. 

Our results on distance approximation are summarized in Table 1. Our algorithm for convexity 
is the most important and technically difficult of our results, requiring a large number of new 
ideas to get running time polynomial in 1/e. To achieve this, we define a set of reference polygons 
P e such that (1) every convex image has a nearby polygon in P e and (2) one can use dynamic 
programming to quickly compute the smallest empirical distance to a polygon in P e . It turns 
out that the empirical error of our algorithm is proportional to the sum of the square roots of 
the areas of the regions it considers in the dynamic program. To guarantee (2) and keep our 
empirical error small, our construction ensures that the sum of the square roots of the areas of the 
considered regions is small. This construction might be of independent interest. Our algorithms do 
not need sophisticated access to the input image: uniformly randomly sampled pixels suffice for our 
algorithms for the half-plane property and convexity. For connectedness, we allow our algorithms 
to query pixels from a uniformly random block. (See the end of Section 2 for a formal specification 
of the input access.) 

Our algorithms for convexity and half-plane work by first implicitly learning the object 2 . PAC 
learning was defined by Valiant [25], and agnostic learning, by Kearns et al. [15] and Haussler [13]. 
As a corollary of our analysis, we obtain fast proper agnostic PAC learners of half-planes and of 
convex sets in two dimensions that work under the uniform distribution. The sample and time 
complexity 3 of the PAC learners is as indicated in Table 1 for distance approximation algorithms 
for corresponding properties. 

While the sample complexity of our agnostic half-plane learner (and hence our distance approx¬ 
imation algorithm for half-planes) follows from the VC dimension bounds, its running time does 
not. Agnostically learning half-spaces under the uniform distribution has been studied by [14], 
but only for the hypercube { — 1, l} rf domains, not the plane. Our PAC learner of convex sets, in 

1 For nxn images, the VC dimension of convexity is 0(n 2 ^ 3 ) (this is the maximum number of vertices of a convex 
lattice polygon in an n x n lattice [1]); for connectedness, it is 0(n). 

2 There is a known implication from learning to testing. As proved in [12], a proper PAC learning algorithm for 
property V with sampling complexity q(e) implies a 2-sided error (uniform) property tester for V that takes q(e/ 2) + 
0(l/e) samples. There is an analogous implication from proper agnostic PAC learning to distance approximation 
with an overhead of 0(l/e 2 ) instead of 0(l/e). We choose to present our testers first and get learners as corollary 
because our focus is on testing and because we want additional features for our testers, such as 1-sided error, that do 
not automatically follow from the generic relationship. 

3 All our results are stated for error probability 5 = 1/3. To get results for general 5, by standard arguments, it is 
enough to multiply the complexity of an algorithm by log 1/8. 
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contrast to our half-plane learner, dimension lower bounds on sample complexity. (The sample 
complexity of a PAC learner for a class is at least proportional to the VC dimension of that class 
[9].) Since VC dimension of convexity of nxn images is ©(n 2 / 3 ), proper PAC learners of convex sets 
in two dimensions (that work under arbitrary distributions) must have sample complexity fl(ro 2//3 ). 
However, one can do much better with respect to the uniform distribution. Schmeltz [24] showed 
that a non-agnostic learner for that task needs 0(e~ 3 / 2 ) samples. Surprisingly, it appears that this 
question has not been studied at all for agnostic learners. Our agnostic learner for convex sets in 
two dimensions under the uniform distribution needs O [ \ log *) samples and runs in time O (^)- 

Finally, we note that for connectedness, we take a different approach. Our algorithms do not 
try to learn the object first; instead they rely on a combinatorial characterization of distance 
to connectedness. We show that distance to connectedness can be represented as an average of 
distances of sub-images to a related property. 

Comparison to other related work. Property testing has rich literature on graphs and 
functions, however, properties of images have been investigated very little. Even though superfi¬ 
cially the inputs to various types of testing tasks might look similar, the problems that arise are 
different. In the line of work on testing dense graphs, started by Goldreich et al. [12], the input is 
also an nxn binary matrix, but it represents an adjacency matrix of the dense input graph. So, 
the problems considered are different than in this work. In the line of work on testing geometric 
properties, started by Czurnaj, Sohler, and Ziegler [8] and Czumaj and Sohler [7], the input is a 
set of points represented by their coordinates. The allowed queries and the distance measure on 
the input space are different from ours. 

A line of work potentially relevant for understanding connectedness of images is on connected¬ 
ness of bounded-degree graphs. Goldreich and Ron [11] gave a tester for this property, subsequently 
improved by Berman et al. [2], Campagna et al. [6] gave a tolerant tester for this problem. Even 
though we view our image as a graph in order to define connectedness of images, there is a sig¬ 
nificant difference in how distances between instances are measured (see [21] for details). We also 
note, that unlike in [6], our tolerant tester for connectedness is fully tolerant, i.e., it works for all 
settings of parameters. 

The only previously known tolerant tester for image properties was given by Kleiner et al. [16]. 
They consider the following class of image partitioning problems, each specified by a k x k binary 
template matrix T for a small constant k. The image satisfies the property corresponding to T if 
it can be partitioned by k — 1 horizontal and k — 1 vertical lines into blocks, where each block has 
the same color as the corresponding entry of T. Kleiner et al. prove that 0(l/e 2 ) samples suffice 
for tolerant testing of image partitioning properties. Note that VC dimension of such a property 
is 0(1), so by Footnote 2, we can get a 0(l/e 2 log 1/e) bound. Our algorithms required numerous 
new ideas to significantly beat VC dimension bounds (for convexity and connectedness) and to get 
low running time. 

For the properties we study, distance approximation algorithms and tolerant testers were not 
investigated previously. In the standard property testing model, the half-plane property can be 
tested in 0(e _1 ) time [21], convexity can be tested in 0(e -4 / 3 ) time [4], and connectedness can 
be tested in 0(W 2 loge -1 ) time [21, 2], As we explained, property testers with running time 
independent of e do not necessarily imply tolerant testers with that feature. Many new ideas 
are needed to obtain our tolerant testers. In particular, the standard testers for half-plane and 
connectedness are adaptive while the testers here need only random samples from the image, so the 
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techniques used for analyzing them are different. The tester for convexity in [4] uses only random 
samples, but it is not based on dynamic programming. 

Open questions. In this paper we give tolerant testers for several important problems on 
images. It is open whether these testers are optimal. No nontrivial lower bounds are known for 
these problems. (For any non-trivial property, an easy lower bound on the query complexity of a 
distance approximation algorithm is ST(1/e 2 ). This follows from the fact that ST(l/e 2 ) coin flips are 
needed to distinguish between a fair coin and a coin that lands heads with probability 1/2 + e.) 
Thus, our testers for half-plane and convexity are nearly optimal in terms of query complexity (up 
to a logorithmic factor in 1/e). But it is open whether their running time can be improved. 

Organization. We give formal definitions and notation in Section 2. Algorithms for being a 
half-plane, convexity, and connectedness are given in Sections 3, 4, and 5, respectively. The sections 
presenting algorithms for being a half-plane and convexity start by giving a distance approximation 
algorithm and conclude with the corollary about the corresponding PAC learner. 

2 Definitions and Notation 

We use [0..n) to denote the set of integers {0,1,..., n — 1} and [n] to denote {1, 2,..., n}. By log 
we mean the logarithm base 2 , and by In, the logarithm base e. 

Image representation. We focus on black and white images. For simplicity, we only consider 
square images, but everything in this paper can be easily generalized to rectangular images. We 
represent an image by an n x n binary matrix M of pixel values, where 0 denotes white and 1 
denotes black. We index the matrix by [0..n) 2 . The object is a subset of [0..ro ) 2 corresponding to 
black pixels; namely, {(i,j) \ M[i,j] = 1}. The left border of the image is the set {(0, j) \j £ [0..n)}. 
The right, top and bottom borders are defined analogously. The image border is the set of pixels 
on all four borders. 

For any region R , we use A(R) to denote its area. 

Distance to a property. The absolute distance , Dist{M\, M 2 ), between matrices Mi and 
M 2 is the number of the entries on which they differ. The relative distance between them is 
dist{M\, M 2 ) = Dist{M\ , M 2 )/n 2 . A property V is a subset of binary matrices. The distance of an 
image represented by matrix M to a property V is dist(M,V ) = min M'eV dist(M, M'). An image 
is e-far from the property if its distance to the property is at least e; otherwise, it is e-close to it. 

Computational Tasks. We consider several computational tasks: tolerant testing [19], addi¬ 
tive approximation of the distance to the property, and proper (agnostic) PAC learning [25, 15, 13]. 
Here we define them specifically for properties of images. 

Definition 2.1 (Tolerant tester). An (ei, C 2 )-tolerant tester for a property V is a randomized 
algorithm that, given two parameters 61,62 £ ( 0 , 1 / 2 ) such that ei < 62 and access to an n x n 
binary matrix M, 

1. accepts with probability at least 2/3 if dist(M,V) < e\; 


5 



2. rejects with probability at least 2/3 if dist(M,V) > € 2 - 

Definition 2.2 (Distance approximation algorithm). An e-additive distance approximation algo¬ 
rithm for a property V is a randomized algorithm that, given an error parameter e G (0,1/4) and 
access to an n x n binary matrix M , outputs a value d G [0,1/2] that with probability at least 2/3 
satisfies \d — dist(M,V)\ < e. 

As observed in [19], we can obtain an (ei,C 2 )-tolerant tester for any property V by running a 
distance approximation algorithm for V with e = (e 2 — ei)/2. Thus, all our distance approximation 
algorithms directly imply tolerant testers. 

Definition 2.3 (Proper agnostic PAC learner). A proper agnostic PAC learning algorithm for class 
V that works under the uniform distribution is given a parameter e G (0,1/2) and access to an image 
M. It can draw independent uniformly random samples ( i,j) and obtain ( i,j ) and M[i,j]. With 
probability at least 2/3, it must output an image M' G V such that dist{M,M') < dist(M,V ) + e. 

Access to the input. A query-based algorithm accesses its nxn input matrix M by specifying 
a query pixel (i,j) and obtaining M[i,j]. The query complexity of the algorithm is the number of 
pixels it queries. A query-based algorithm is adaptive if its queries depend on answers to previous 
queries and nonadaptive otherwise. A uniform algorithm accesses its nxn input matrix by drawing 
independent samples (i, j) from the uniform distribution over the domain (i.e., [0..n) 2 ) and obtaining 
A block-uniform algorithm accesses its n x n input matrix by specifying a block length 
r G [n]. For a block length r of its choice, the algorithm draws i,|/G [[n/r]] uniformly at random 
and obtains set {(/j) \i/r\ = x and [j'/rJ = y} and M[i,j] for all (■ i,j ) in this set. The sample 

complexity of a uniform or a block-uniform algorithm is the number of pixels of the image it 
examines. 

Remark 2.1. Uniform algorithms have access to independent (labeled) samples from the uniform 
distribution over the domain. Sometimes it is more convenient to design Bernoulli algorithms that 
only have access to (labeled) Bernoulli samples from the image: namely, each pixel appears in the 
sample with probability s/n 2 , where s is the sample parameter that controls the expected sample 
complexity. By standard arguments, a Bernoulli algorithm with the sample parameter s can be used 
to obtain a uniform algorithm that takes 0(s ) samples and has the same guarantees as the original 
algorithm (and vice versa). 

3 Distance Approximation to the Nearest Half-Plane Image 

An image is called a half-plane image if there exist an angle p G [0, 27t) and a real number c such 
that pixel (x, y ) is black in the image iff x cos p + y sin p > c. The line x cos p + y sin p = c, denoted 
Lt, is a separating line of the half-plane image, i.e., it separates black and white pixels of the image. 
We call p the direction of the half-plane image (and Lf). Note that p is the oriented angle between 
the x-axis and a line perpendicular to Lf. For all p G [0, 27t) and c G M, the half-plane image with 
a separating line Lf is denoted Mf and the closed half-plane whose every point (x, y) satisfies the 
inequality xcos p + y sin p > c is denoted Hf. We can think of a half-plane image as a discretized 
half-plane. 

Theorem 3.1. For eG(™,j), there is a uniform e-additive distance approximation algorithm for 
the half-plane property with sample complexity 0{^ log ^) and time complexity 0{\ log ^). 
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Proof. At a high level, our algorithm for approximating the distance to being a half-plane (Al¬ 
gorithm 1) constructs a small set Ai e of reference half-plane images. It samples pixels uniformly 
at random and outputs the empirical distance to the closest reference half-plane image. The core 
property of is that the smallest empirical distance to a half-plane image in J\4 e can be computed 
quickly. 

Definition 3.1 (Reference directions and half-planes). Given e £ (0, \), let a = en/y/2. Let 
D e be the set of directions of the form ie for i £ [0.. f27 t/ e~|), called reference directions. The set 
of reference half-plane images, denoted A4 e , consists of every half-plane image for which Lf is a 
separating line, where <p £ D e and c is an integer multiple of a. 

In other words, for every reference direction, we space separating lines of reference half-plane 
images distance a apart. By definition, there are at most \/2 n/a = 2/e reference half-plane images 
for each direction in D e and, consequently, \M. e \ < 27r/e • (2/e) < 13/e 2 . 


Algorithm 1: Distance approximation to being a half-plane, 
input : parameters n £ N, e £ (90/n, 1/4); Bernoulli access to an n x n binary matrix M . 

1 Sample a set S' of s = \ In / pixels uniformly at random with replacement. 

2 Let D e ,M e be the sets of reference directions and half-planes, respectively (see 
Definition 3.1) and a = en/\/2. 

// Compute d = ^min^ d(M'), where d(M') = y • \{p £ S : M\p\ / Af'[p]}|: 

3 foreach tp £ D e do 

// Lines with direction tp partition the image. Bucket sort samples by 
position in the partition: 

4 Assign each sample ( x,y) £ S to bucket j = [(x cos p + ysixup)/a\. 

5 For each bucket j, compute Wj and bj, the number of white and black pixels it has. 

6 For each j, where Mj a £ M e , compute d(Mj a ) = \ Jfk<j b k + 7 Ylk>j w k- 

7 Output d , the minimum of the values computed in Step 6. 


Lemma 3.2. For every half-plane image M, there is M’ £ A4 e such that dist(M, M') < e/1.8. 

Proof. We mentioned that a half-plane image can be viewed as a discretized half-plane. Next we 
define a set of half-planes that we use in the proof of the lemma. 

Definition 3.2. The set of reference half-planes, denoted TL e , consists of every half-plane Hf, 
where <p £ D e and c is an integer multiple of a. 

Claim 3.3. For every half-plane H, there is a half-plane FT £ Fi e such that the area of the 
symmetric difference of H and H' is at most en 2 /2. 

Proof. Consider a half-plane Hf. Let <p' be a reference direction closest to (p. Then \(p — <p'\ < e/2. 
We consider two cases. See Figures 3.1 and 3.2. 

Case 1: Suppose that there is a reference half-plane IT/ such that the lines Lf and L/, 
intersect inside inside [0, n — l] 2 . Note that the length of every line segment inside [0,n — l] 2 is 
at most \/2 n. The symmetric difference of Hf and Hf, inside [0, n — l] 2 consists of two regions 
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Figure 3.1: Proof of Lemma 3.2: triangular 
regions. 


Figure 3.2: Proof of Lemma 3.2: triangular 
and quadrilateral regions. 


formed by lines Lf and L /,. Each of these regions is either a triangle or (if it contains a corner 
of the image) a quadrilateral. First, suppose both regions are triangles. The sum of lengths of 
their bases, that lie on the same line, is at most \/2n, whereas the sum of their heights is at most 
sin(e/2) x \/2n < en/y/ 2. Hence, the sum of their areas is at most en 2 /2. 

If exactly one of the regions is a quadrilateral, we add a line through the corner of the image 
contained in the quadrilateral and the intersection point of Lf and Lf . It partitions the symmetric 
difference of He and H /) into two pairs of triangular regions. Let tp\ (respectively, (p 2 ) be the angle 
between the new line and Lc (respectively, L/,). Then ipi + p >2 < e/2. Applying the same reasoning 
as before to each pair of regions, we get that the sum of their areas is at most (fin 2 + ^n- 2 < en 2 /2. 
If both regions are quadrilaterals, we add a line as before for each of them and apply the same 
reasoning as before to the three resulting pairs of regions. Again, the area of the symmetric 
difference of He and Hf, is at most en 2 /2. Thus, Hf, is the required M'. 

Case 2: There exist reference half-planes and H^. such that the line Lc is between 
L = L* and L' = L/,, . The region between L and L' inside the image has length at most y/2‘n 
and width a. Thus, its area is at most en 2 . Partition it into two regions: between L and Lc and 
between L' and Lc- One of the two regions has area at most en 2 /2. Thus, H^, or H% +a is the 
required M'. □ 

To complete the proof of the lemma we use the following theorem that relates the area of a 
lattice polygon and the number of integer points that the polygon covers. (A lattice polygon is a 
polygon whose vertices have integer coordinates.) 

Theorem 3.4 (Pick’s theorem [20]). For a simple lattice polygon G, let a denote the number of 
lattice points in the interior of G and ft denote the number of lattice points on the boundary of G. 
Then A(G) = a + (3/2 — 1. 




Definition 3.3. For a polygon G, let Perim(G ) denote the perimeter of G and Pix(G) denote the 
number of pixels in G, i.e., pixels in the interior of G and on its boundary. 

Proposition 3.5. Let G be a convex polygon. Then Pix(G) < A[G) + Perim(G)/2 + 1. 

Proof. If all pixels in G are collinear then Pix(G) < Perim(G)/2 + 1 < A{G) + Perim(G)/2 + 1. 
This follows from the fact that the length of a line segment inside a polygon is at most half of the 
perimeter of the polygon and that the number of integer points on the line segment is at most the 
length of the line segment plus one. If not all pixels in G are collinear then consider the convex hull 
of all pixels in G. Let a and [3 denote the number of pixels in the interior and on the boundary of 
that convex hull, respectively. (Note that the convex hull is a lattice polygon). By Theorem 3.4, we 
obtain that a+(3/2—1 < A{G) and Pix(G) = a+(3 < A(G)+/3/2+1 < A(G)+Perim(G)/2+1. □ 

For some <p and c, half-plane image M = Mf. Consider the half-plane Hf. By Claim 3.3, there 
is a half-plane Hf, such that the area of the symmetric difference of Hf and Hf, is at most en 2 /2, 
where ip' G D t and c is a multiple of a. 

Recall that there are 4 cases for the symmetric difference of H and H'. More precisely, it consists 
of: 1) two triangles, 2) a triangle and a quadrilateral, 3) a quadrilateral, or 4) two quadrilaterals. 
We consider the last case (this is the hardest case and the three other cases are handled similarly). 
Let the symmetric difference of H and H' consist of two quadrilaterals Q\ and Q 2 - (For reference, 
see Figure 3.2 where a triangle and a quadrilateral are shown.) Every line segment in the image has 
length at most \/2n. Thus, Perim{Q\ ) + Perim{Q 2 ) < 6 \[2n. By Proposition 3.5, we obtain that 
Pix(Qi) + Pix(Q 2 ) < A(Q\) + A(Q 2 ) + ( Perim(Q\) + Perim(Q2))/2 + 2 < en 2 /2 + 3\/2n + 2 < 
en 2 / 1.8 (recall that e G (90/n, 1/4)). This completes the proof. □ 


Analysis of Algorithm 1. Let d\i be the distance of M to being a half-plane. Then there 
exists a half-plane matrix M* such that dist(M,M*) = d\i■ By a uniform convergence bound 
(see, e.g., [5]), since s > (2.6/e 2 )(ln \M e \ + In 6) for all e G (0,1/4), we get that with probability at 
least 2/3, \dist(M, M') — d(M')\ < e/2.25 for all M' G M e . Suppose this event happened. Then 
d > dM ~ e/2.25 because dist(M,M') > dM for all half-planes M’. Moreover, by Lemma 3.2, 
there is a matrix M G H e such that dist(M, M) < distfM , M*) + dist(M*, M) < dM + e/1.8. For 
this matrix, d(M) < dist(M, M) + e/2.25 < dM + e. Thus, dM — e/2.25 < d < dM + e. That is, 
|dM — d\ < e with probability 2/3, as required. 

Sample and time complexity. The number of samples, s, is 0( 1/e 2 log 1/e). To analyze 
the running time, recall that \D e \ = 0( 1/e). For each direction in D e , we perform a bucket 
sort of all samples in expected O(s) time. The remaining steps in the foreach loop of Step 3 
can also be implemented to run in 0(s ) time. The expected running time of Algorithm 1 is thus 
0(l/e-s) = 0(l/e 3 log 1/e). Remark 2.1 implies a tester with the same worst case running time. □ 

Corollary 3.6. The class of half-plane images is properly agnostically PAC-learnable with sample 
complexity 0(4 log ^) and time complexity 0(4 log ^) under the uniform distribution. 

Proof. We can modify Algorithm 1 to output, along with d = minM'eM £ d(M') , a reference half¬ 
plane M that minimizes it. By the analysis of Algorithm 1, with probability at least 2/3, the 
output M satisfies distfM , M) < dM + e. □ 
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4 Distance Approximation to the Nearest Convex Image 

An image is convex if the convex hull of all black pixels contains only black pixels. 

Theorem 4.1. For e 6 (n -1 / 6 ,1/4), there is a uniform e-additive distance approximation algorithm 
for convexity with sample complexity 0{^ log j) and running time O(js). 

Proof. The starting point for our algorithm for approximating the distance to convexity (Algo¬ 
rithm 2) is similar to that of Algorithm 1 that approximates the distance to a nearest half-plane. 
We define a small set P e of reference polygons. Algorithm 2 implicitly learns a nearby reference 
polygon and outputs the empirical distance from the image to that polygon. The key features 
of P e is that (1) every convex image has a nearby polygon in P e , and (2) one can use dynamic 
programming (DP) to quickly compute the smallest empirical distance to a polygon in P e . 

We start by defining reference directions, lines, points, and line-point pairs that are later used 
to specify our DP instances. Reference directions are almost the same as in Definition 3.1. 

Definition 4.1 (Reference lines, line-point pairs). Fix eo = e/144. The set of reference directions 
is D e = {7 t/2 } U {ieo : i £ [0, [~27r/eo~|)}• For every <p £ D e , define the set of reference lines 
L v = {(.-. I passes through the image and satisfies the equation x cos <p + y sin ip = c, where c is an 
integer multiple of eon}. For each reference line, the set of reference points on t contains points 
w.r.t. £, which are inside [0 ,n — l] 2 , spaced exactly eon apart (it does not matter how the initial 
point is picked). A line-point pair is a pair (£,b), where i is a reference line and b is a reference 
point w.r.t. £. (Note that there could be reference points on i that were defined w.r.t. some other 
reference line. This is why we say “a reference point w.r.t. £”, and not “a reference point on l”.) 

Roughly speaking, a reference polygon is a polygon whose vertices are defined by line-point pairs. 
There are additional restrictions that stem from the fact that we need to be able to efficiently find 
a nearby reference polygon for an input image. The actual definition specifies which actions we can 
take while constructing a reference polygon. Reference polygons are built starting from reference 
boxes, which are defined next. 



Figure 4.1: A reference box. 


Figure 4.2: Triangles of the set Tq. 
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Definition 4.2 (Reference box). A reference box is a set of four line-point pairs ( £i,bi ) for 
i = 0,1,2,3, where £q ,£2 are distinct horizontal lines, such that £q is above £ 2 , and (^ 1 ,^ 3 ) are 
distinct vertical lines, such that £\ is to the left of £ 3 . The reference box defines a vertex set 
Bo = {bo,b±,b 2 ,bs} and a triangle set To, formed by removing the quadrilateral bob^bz from the 
rectangle delineated by the lines £q,£\, £ 2 , £ 3 - See Figures 4-1- 4-%- 

Note that line-point pairs do not depend on the input. Intuitively, by picking a reference box, 
we decide to keep the area inside the quadrilateral 6 o^i^ 2^3 black, the area outside the rectangle 
formed by ^ 0 , ^ 1 ,^ 2 , ^3 white, and the triangles in To gray, i.e., undecided for now. 

Definition 4.3. For points x,y, let £(x,y) denote the line that passes through x and y. Let xy 
denote the line segment between x and y and \xy\ denote the length of xy. 

Reference polygons are defined next. Intuitively, to obtain a reference polygon, we keep subdi¬ 
viding “gray” triangles in To into smaller triangles and deciding to color the smaller triangles black 
or white or keep them gray (i.e., undecided for now). We also allow “cutting off” a quadrilateral 
that is adjacent to black and coloring it black (a.k.a. “the base change operation”). The main 
recoloring operation from Definition 4.4 is illustrated in Figure 4.3. Even though the definition of 
reference polygons is somewhat technical, the readers can check their understanding of this concept 
by following Algorithm 2, as it chooses the best reference polygon to approximate the input image. 

Definition 4.4 (Reference polygon). A reference polygon is an image of a polygon Hull(B), where 
the set B can be obtained from a reference box with a vertex set Bo and a triangle set To by the 
following recursive process. Initially, T en ,i = 0 and B = Bq. While To / 0, move a triangle T 
from To to T en d and perform the following steps: 

1. (Base Change). Let T = Ab'b"v, where b',b" € B. Select reference point b r 0 on b'v w.r.t. line 
£(b',v), and reference point 6 q on b"v w.r.t. line £(b",v). Add &q, 6 q to B. (This corresponds 
to coloring the quadrilateral Fb^b^b'' black.) Let h be the height of Ab^b'^v w.r.t. the base 6 q&q. 

2. (Subdivision Step) If h > 6 eon, choose whether to proceed with this step or go to Step 3 (both 

choices correspond to a legal reference polygon); otherwise, go to Step 3. Let ip be the angle 
between £(b' 0 ,b'() and the x-axis, and (p G D e be such that \p> — <p\ < cq/ 2. Select a reference 
line-point pair ( £,b ), where the line 1 crosses b' 0 v and b'(v, whereas b is in the triangle 

Ab^b^v. Let v’ (resp., v") be the point of intersection of £ and b' 0 v (resp., £ and b'(v). Let 
T' = Ab' 0 bv r , T" = Ab'(bv", as shown on Figure 4-3. Add b to B and triangles T' , T" to To- 
(This represents coloring Ab^b^b black and keeping T' and T" gray.) 

3. (End of Processing) Do nothing. (This represents coloring Ab^b'^v white). 

By Remark 2.1, to prove Theorem 4.1, it suffices to design a Bernoulli tester that takes s = 
0{-K log j) samples in expectation and runs in time 0(4y). Our Bernoulli tester is Algorithm 2. 
In Algorithm 2, we use the following notation for the (relative) empirical error with respect to 
an input image M , a set of sampled pixels S, and the size parameter s. For an image M 1 , let 
d(M') = ^ • |{tt G S : M[u\ / M / ['u]}|. For every region R C [0..n) 2 3 , we let d + (R ) = ~ s • |{« € 
S Cl R : M[u\ = 0}|, and d-(R) = ^ • |{rt G S Cl R : M[u\ = 1}|, i.e., the empirical error if we make 
R black/white, respectively. 

Subroutine Best, presented next, chooses the option with the smallest empirical relative error 
among those given in Definition 4.4, items 1-3. 
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Algorithm 2: Bernoulli approximation algorithm for distance to convexity, 
input : parameters n € N, e € (0,1/4); Bernoulli access to an n x n binary matrix M. 

1 Set s = log ^). Include each image pixel in the sample S w.p. p = s/n 2 . 

// Run the algorithm to find d, the smallest fraction of samples 

misclassified by a reference polygon in P f . A dynamic programming 
implementation of the algorithm is given in Section 4.3. 

2 Let W(_ Q (resp., Wp 2 ) be the set of pixels of the image M that lie either above £q or to the 
left of bo on to (resp., either below £ 2 or to the left of b 2 on £ 2 )- Let Wp x (resp., Wp 3 ) be the 
set of pixels of M — Wp 0 — Wp 2 to the left of £\ (resp., to the right of £ 3 ). (See Figure 4.4) 

3 Set (1=1. 

4 forall the line-point pairs (£o,bo), {£ 2 ^ 2 ), where £o,£ 2 are horizontal lines do 

5 Set dieft = 1. // The variable to compute the best error for the 

region to the left of 6062 , between to and £ 2 . 

6 foreach line-point pair (£i,bi), where £\ is a vertical line do 

7 Let vo (resp., V 2 ) be the point where ti intersects to (resp., ti intersects t 2 ). 

s Jief t = min(d left , d-(W £l ) + d + (Ab 0 b 1 b 2 ) + Best(A5 0 6it 0 ) + Best(A6i6 2 u 2 )) 

9 Similarly to Steps 5-8, compute d r i g ht- 

// The best error for the region to the right of bob 2 , between to and £ 2 . 

10 Compute d = min(d, d-(Wp 0 U ll/ 2 ) + di e f t + dright)- 

11 return d. 


Algorithm 3: Subroutine Best used in Algorithm 2. 
input : triangle A b'b"v 

II Use dynamic programming (see Section 4.3 for implementation details). 

1 Set d* = 1. 

2 forall the reference points b' Q and 5/ on the sides b'v and b"v, respectively, do 

3 Compute d* = min (d*,d+(b , b”b'f ) b' 0 ) + BestForFixedBase(&o&ot)) 

4 return d* 


Algorithm 4: Subroutine Best For Fixed Base used in Algorithm 3. 
input : triangle Ab^b'^v 

1 Set d* = d- (A&o&o'u) 

2 if the height of Ab\b 2 v w.r.t. the base b\b 2 is more than 6eo n then 

3 foreach line-point pair (t, b), where £ € (see Definition f.f, item 2), b G Abb{ } b'(, 
line £ intersects the side b' 0 v at some point v 1 and the side b'(v at some point v", resp. do 

4 Compute d* = min(d*, d-(Av'v"v) + d + (A6 / 0 6 / 0 / 6) + Best(A6 , 0 6u / ) + Best(A65g / u // )) 

5 return d* 
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Figure 4.3: An illustration to Definition 4.4: Triangle 

A b'b"v. Figure 4.4: Regions Wi 0 , Wp A . W^ 2 , and Wi 3 . 


Our set of reference polygons has two critical features. First, for each convex image there is 
a nearby reference polygon. This is proved in Section 4.1. It turns out that the empirical error 
for a region is proportional to the square root of its area. The second key feature of our reference 
polygons is that, for each of them, the set of considered triangles, T en( j, has small J^TeT , V ^(T). 
The proof of this fact, as well as the analysis of the empirical error appears in Section 4.2. Finally, 
Section 4.3 completes the analysis of the algorithm, gives details of its implementation and presents 
the corollary about agnostic PAC learning of convex objects. 

4.1 Existence of a nearby reference polygon 

Lemma 4.2. For every convex image M, there exists M' £ P e such that dist(M, M') < e/6. 

Proof. Consider a convex image M. We will show how to construct a nearby reference polygon M' 
using the recursive process in Definition 4.4. First, we obtain a reference box (see Definition 4.2) 
for M as follows. Let (£o,bo) be a line-point pair, where bo is black in M and £q is the topmost 
horizontal line that contains such a reference point. Similarly, define (£ 2 ^ 2 ), replacing “topmost” 
with “bottommost”. Analogously, define the two line-point pairs (£\, b\), (£ 3 , 63 ) with vertical lines. 
The four line-point pairs (£i, bf) for i £ [0..4) define the reference box for M. as shown in Figure 4.5. 

Next we construct the set B from the reference box, as in Definition 4.4. We also maintain two 
sets of line segments, F± and F 2 , that are used in the analysis. Initially, F\ = F 2 = 0. The colors of 
the points in the description below are with respect to the convex image M. This is how we make 
the choices at each step of the recursive process in Definition 4.4 to obtain our reference polygon: 

1. (Base Change) Choose 6 q,6q to be the furthest from b'b" black reference w.r.t. lines £(b',v) 
and £(b",v), respectively. Recall that h is the height of Ab^b'^v w.r.t. the base 6 q6q. 

2. (Subdivision Step) If h > 6eon, let Bm denote the convex hull of all black pixels in M and 
points b \, b'[ be the intersection points of Bm with b' 0 v and b'^v, respectively. Choose a line- 
point pair (£,b) such that £ £ is the furthest from b\ b" line that intersects b’v and b"v, 
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Figure 4.5: Reference box for a convex image M. 

and b is black. Let I intersect b'v and b”v at vf and v" , respectively and let it intersect Bm 
at y' and y" , as in Figure 4.6. Put the line segment y'y" in F\ and Av'v"v in T cut . If no line 
in contains a black reference point in Ab\ b'(v or if h < 6 eon, go to Step 3. 

3. (End of Processing) Put the line segment 6 / 6 " in F 2 and Ab^b'^v in Tfi n . Triangle Ab' {) b'fiu is 
not subdivided and is called a final triangle. 

Observe that M and M' differ only on three types of regions: outside of the reference box, 

2 

inside the triangles in Tfi n , and inside the triangles in T cut . To show that Dist(M, M ) < we 
prove in Claims 4.3, 4.4, and 4.7 that the number of disagreements in each of the three regions is 
small. For any region R C [0..n) 2 , let Err(R) = |{n € R : M[u] 7 ^ M' [it]}|. 

Next claim follows from the analysis of the convexity tester in [21]. 

Claim 4.3. The number of black pixels in M outside the reference box is at most 12 • eon. 2 . 

Claim 4.4. Let Afififfiv be a final triangle and points b\, b'[ be the points of intersection of Bm with 
b' 0 v and b^v, respectively. Then Err(Ab' 0 bQv) < 4 • Ib^b'fieon + 2. 

Proof. By Proposition 3.5, Err(Ab' 0 b r Qv) < Pix(A 6 / 1 6 ' 1 / u) < A(Ab\b'[v) + Perim{Ab' l b'[v)/2 + 1. 
Note that Z.b\vb'[ is obtuse. 

Proposition 4.5. Let T be a triangle with sides a, b and c. Let a be the angle opposite to side a, 
and h a be the height w.r.t. base a in T. If a > 7 t /2 then h a < a/2. 

Proof. By the cosine theorem, a 2 = b 2 + c 2 — 2bc • cos a > b 2 + c 2 > 2bc > 4 • A(T ) = 2 • a • h a . 
Thus, h a < a/2, as claimed. □ 

If h < 6 eon then by Proposition 4.5, the area A(Abib"v) < 3-\bibi\eon. Since Perim(Abibfv) < 
3 • {bib'll we obtain that A{Ab' l b'lv) + Perim(Ab' 1 b'lv)/2 + 1 < 3 • {blb'fieon + 1.5 • | b^bf] + 1 < 
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Figure 4.6: An illustration to Subdivision Step in Ab^b^v. 
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Figure 4.7: An illustration of triangle A y'y"v. 


4 • \b\b'l\tQn + 2 and the claim holds (recall that e = fi(l/n)). Now assume that h > 6 eon and no 
line in L^ with a black reference point intersects the line segments b[v and b'[v in Ab'^b'^v. 

Proposition 4.6. Let Ab^b^v be a triangle in which Zb^vb^ is obtuse and Bm intersects the sides 
b' Q v and b^v. Let £ G L$ be a line that intersects Bm at y' and y", and it intersects b' 0 v and b^v at 
v' and v", respectively. See Figure 4- 7. Then Err(v'v"v ) < — 1 _ Ayy I _|_ ^ 


Proof. Let v be a point (inside Ab^b'^v) such that £(y', v) is parallel to £(b' 0 v) and £(y ", v) is parallel 
to £{b' { ' ) v). Since Bm is convex, the portion of Bm in Av'v"v is entirely inside A y'y"v. Angle 
Zy'vy" is obtuse since Zy'vy" = Zb[ ) vb" ) . Then by Proposition 4.5, A(Ay'y"v) < \ y . Note that 
Perim(Ay'y"v) < 3\y'y"\. Since Err(v'v"v ) < Pix{Av'v"v) then by Proposition 3.5, Err{y'v"v ) < 


\y'y"\ 2 , W\ , i 

4 ' 2 1 


□ 


Let £ E L<f, be the line that does not intersect the line segment b^bf and that is closest to 
it. Let £ intersect the line segments b^v and b'[v at v' and v". Then either Zv'v"v < Zb^b'^v or 
Zv"v'v < Zb'^b'^v. W.l.o.g. assume that Zv'v"v < Zb'^b'^v. Let £ be the line that is parallel to £ 
and that passes through point 6 q as shown in Figure 4.8. Let v[ be the intersection point of £ and 
the line segment b' 0 v. Denote the angle between £ and £(b' 0 , b" } ) by 7 . The distance between £ and £ 
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Figure 4.8: An illustration of line £ in Ab^b'^v. 


is at most 2eon. Otherwise there are two distinct lines from L$ that pass through the line segment 
b'^b”. Since \WqV{\ < eon the distance between the two lines is less than eon, contradiction. 

Now we find an upper on the number of black pixels in Ab^b'^v. Let Bm intersect l at y' and 
y". Then \y'y"\ < eon. By Proposition 4.6, Pix(Av'v"v) < —I- + 1. The number of black 

pixels in the rectangle b' 0 bov"v' is at most Pix(b' 0 bQv"v'). The area 

A^q&qVV) = A(v' l bQv"v') + A(A6g6 / 0 , u / 1 ) < 2e 0 n|u^6o| + A(A6 , 0 6oU , 1 ) < 2|6ofeo|e 0 n + A(A6 , 0 6gU , 1 ). 

The last inequality holds since is obtuse. Let d\ (resp., e? 2 ) denote the distance from the 

point v' (resp., v[) to the line £(6 / 1 ,6' 1 / ). We find an upper bound on A(A6 / 0 6 / 0 / u / 1 ): 


A(Ab'Xv[) 


\m\ 

2 


\m\ • \K v ’i\ • sin 7 < WqK\ • V^n{e 0 /2) 
2 ~ 2 


< 0.4|6q6q | • e 0 n. 


Thus, A(b' 0 bQv"v') < 2.4 • l&Q&gleon. The height h < d\ + eon. By Proposition 4.5, if | b\b"\ < lOeon 
then d\ < 5eo n. It implies that h < 6eon, contradiction. Therefore, \b\b'{\ > lOeon. By the triangle 
inequality | 6 q6q | < | b\ b" | + 2eo n. Thus, 


A(b' 0 bov''v') < 2.4 • (|i/ 1 6 , 1 , | + 2eon)eon < 3 • \b\bi\eon. 


The last inequality holds since | b\ b ”| > lOeon. Note that 

Perim(b'ob'ov"v')/2 < 2 ■ |I — 216^I + 4eon. 


Thus, by Proposition 3.5, 

Pix(b' 0 bov"v') < 3 • \b\bi\eon + 2 \b']b'[\ + 4eon + 1 


and 


En^(Ab' 0 bov) < Pix(b' 0 bQv"v') + Pix(Av'v"v) < 3-|6 / 1 6 / /|eon+2|6 / 1 6 / 1 / |+4eon+l + 


(e 0 n) 2 3e 0 n 


+ ■ 


+ 1 < 


< 4 • \bibi\eon + 2. 

The last inequality holds since 16^6^| > lOeon. This completes the proof of Claim 4.4. 


S3 
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Claim 4.7. Let triangle Ab^b'^v and line l be as defined in Step 2 of the recursive construction of 
M 1 . Let v' and v" denote the points of intersection of £ and b' 0 v and b'fiu, respectively. Let y' and 
y" be the points of intersection of Bm and t. Then Err( Av'v"v) < 4 • \y'y"\eon + 2. 


V 



Figure 4.9: An illustration of line £' in triangle Ab'fiffiv. 


Proof. If \y'y"\ < eon then, by Proposition 4.6, Err{Av'v"v) < ?/ v 4 + ^ +1 < 4- \y'y"\eon + 2. 

Now assume that \y'y"\ > eon. Let £' E L$ be the line at distance eon from £ closer to v, as in 
Figure 4.9. Let £' intersect b' 0 v and b^v at v[ and v'(. respectively. Then Err^v'v^v'fv") is at most 
the number of black pixels in v'v\ v'[v". Note that all black pixels in v'v\v'(v" are inside a rectangle 
with length \y'y"\. Thus, by Proposition 3.5, the number of black pixels in v'v\v'fv" is at most 
\y'y"\eon + 2\y'y"\ + 1. The distance between the points of intersection of Bm with £' is at most 
eon. Thus, by Proposition 4.6, 

Err(y'v”v ) < \y'y"\e Q n + 2\y'y"\ + 1 + + 1 < 4 • \y'y"\eon + 2. 

The last inequality holds since \y'y"\ > e^n. This completes the proof of Claim 4.7. □ 

Observe that all points in B lie on the boundary of a convex polygon. Images M and M' differ 
only on pixels outside of the reference box and inside the triangles A b^bfu and Av'v"v. All the 
line segments in F\ U F-i are the sides of a convex polygon which is inside an n x n square. Thus, 
the sum of the lengths of the line segments in F\ U F^ is at most 4n. Now we find an upper bound 
on |Tfi n |. Note that in the process of constructing a reference polygon starting from triangles in 
To, every triangle is subdivided into at most two new triangles. Fix a triangle T E To- Consider a 
binary tree Bt rooted at T, where every node is some triangle obtained during the reference polygon 
construction and every triangle in Bt has at most two children triangles obtained after subdivision 
of their parent (during the construction). Triangles in Tfi n correspond to the leaves of the binary 
tree. Thus, to upperbound |Tg n | we need to find the maximum possible height of Bt and we need 
to assume that the tree is full. Recall A6 q6qV and h from the construction of a reference polygon. 
Triangle A6 q6qU is not subdivided if h < 6eon. By Proposition 4.5, if A(A6 q6qu) < 36(eon) 2 then 
h < 6eon. Thus, a triangle is not subdivided if its area drops below 36(eon) 2 . Note that every 
triangle in To has area at most n 2 . Consider a triangle T\ in Bt with two children T[ and T'f. Let 
k be the height of Bt- By Claim 4.9, max{A(T(), AfiF^)} < |A(Ti). Thus, every triangle in level 
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i E [k] of Bt has area at most (2/3)*n 2 . The area of every triangle in level k — 1 of Bt is at least 
36(eon) 2 (otherwise, non of the triangles in this level is subdivided and the height of Bt cannot be 
k). We obtain that (2/3 ) k ~ l n 2 > 36(eon) 2 and thus, k < 5 • In yh Therefore, the number of leaves 
in Bt is at most 2 k < n/4 (recall that e > n -1 / 5 ) and |Tg n | < 4 • (n/4) = n. By Claims 4.3, 4.4 
and 4.7, 

Dist(M,M') < (V IM+-V |yV / |)-4eon+12eon 2 +2|T fin | < 26e 0 n 2 +2n < 27e 0 n 2 . 

z — ‘'b' 1 b'{£F2 z — 'y'y”€Fi 

This completes the proof of Lemma 4.2. LI 

4.2 Error analysis 

Lemma 4.8. For each set T en d obtained in the construction of a reference polygon in Definition f.f, 

E V^(T)<11 n. 

T£Tend 


V 



Proof. All triangles in T cnc j are obtained by partitioning the four initial triangles in To- The 
following claim analyzes how the area is affected by one step of partitioning. 

Claim 4.9. Let T' and T" be two gray triangles obtained from a triangle T in Subdivision Step of 
Definition 4-4■ Then y 7 A(T') + y/A(T") < yj | • A(T). 

Proof. Observe that y 7 A(T') + y 7 A(T") is maximized when b' 0 = b' and 6q = b". W.l.o.g. we prove 
the lemma for this case. We use notation from Figure 4.10. Recall that a triangle T is partitioned 
only if its height h > 6eon. Since the sides of T are of length at most y/2n, the height is that 
large only if both angles adjacent to the base 6q6q are greater than 4eo- (To see this, consider an 
angle a between the base and a side of length a. We get 6eora < h = a ■ sin a < y/2n ■ a. Thus, 
a > 6eo/\/2 > 4eo-) 

First, we find the maximum value of y 7 A(T') + y 7 A(T") for a fixed line t on which position 
of point b varies. Let a = Zb^bfu. f3 = Zb^bfu and 7 be the angle between lines i and £(5 q,6q). 
W.l.o.g. assume that Zv'v"v < (5. Then Zv"v'v = a-\- 7 and Zv'v"v = (3 — 7 . By the construction 
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of triangles in T en( j, a + (3 < | and 7 < ^L Let q = |6 qu|, r = |6 q'u|, t = \v'v"\ and qx = \v'v 
ry = \v"v\, tz = \bv"\ (x , y, 2 ; € [0,1]). Let 


/ w = / mt : i+Tin = ^ ~' t( 1 ~ ; 5111(0 + 7) ~ + / r(1 ~' %_ sill(g ~ 7> . 


Thus, /(z) = y^Ci • (1 — z) + yJCi- z, where C\ = A(Ab' 0 v , v"), C 2 = ^ 4 (A&qW) are constants. 
By the Cauchy-Schwarz inequality, f(z) = • (1 — z) + \JCi ■ z < 1/ (C\ T C2XI — z + z) = 

VC1 + C2. 

Next, we find the maximum value of C\ T C 2 varying position of l inside T. We use the fact 


that 


C\ = A(Ab' 0 v"v) — A(Av'v"v) 


(q — qx)ry ■ sin(a T (3) 
2 


C 2 = A(AbQv'v) — A(Av'v"v) 


(r — ry)qx ■ sin(a + (3) 
2 


to obtain 


Ai + Cs 


(x T y — 2 xy)qr ■ sin(a + (3) 
2 


(xTy-2xy)A(T). 


We need to show that x + y — 2xy <2/3. Let c = ^ = bU A &m T+T _ Since c is constant and the 

v & — / # smasin(p—7) 

geometric mean of two numbers is at most their arithmetic mean 


1 /x T y — 2xy = v/26 • \ x( 


11 c . /—— 1 . it c . 

X( — - x ) < V 2c ■ - • (x T — - X) 


IT c 

a/86 


We prove that | which is equivalent to (3c — l)(c — 3) < 0. The latter inequality holds if 

1 < c < 3. Function sin$ is increasing on [ 0 , 7 r/ 2 ] thus, 1 < c. Now we show that c < 3. If 7 = 0 
the inequality holds. Let us assume that 7 > 0. We need to prove that 


sin f3 sin(a T 7 ) cot 7 T cot a 
sin a sin(/3 — 7 ) cot 7 — cot j3 


Function cot# is decreasing on (0,vr/2] thus, 

cot 7 T cot a ^ cot 7 T cot 4eo ^ cot 7 T cot eo 
cot 7 — cot (3 ~ cot 7 — cot 4eo — cot 7 — cot eo 


The last inequality is equivalent to 2 cot eo < cot 7 which is true since 2 cot eo < cot // < cot 
This completes the proof of Claim 4.9 

Let A \,..., A 4 be the areas of the first four triangles in To- Then ^2f =1 A % < n 2 . By construction 
of triangles in T cn( j, Claim 4.9, and concavity of the square root function, 


J2 V A (T) <K-J2 VA) < 2 Ky/M T A 2 T 6 I 3 T A 4 < 2 K • n, 

T'STend j= 1 


where K = J2m=o(V 2/3) m = (1 — y/2/3) 1 < 5.5. This completes the proof of Lemma 4.8. □ 
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Let M be an input image, S be the set of samples obtained by the algorithm, and s be the 
parameter in the algorithm. For any image M 1 , let d(M') = dist(M,M') and d(M') = / • |{tt £ S : 
M[u] / M'[u]}|. Also, for any region R C [0..n) 2 , let d(M'\ R ) = ^2 • |{u £ R : M[u] / M'[u]}| and 
d{M’\ R ) = ±-\{ueSUR: M[u] + M'[u}}\. 

Lemma 4.10. With probability at least 2/3 over the choice of the samples taken by Algorithm 2, 

| d(M') — dist(M,M')\ < 5e/6 for all reference polygons M'. 

Proof. Consider a region R = (R + ,l?_), partitioned into two regions R + and such that in 
some step of the algorithm we are checking the assumption that is black and is white, 
i.e., evaluating d + (R + ) + d-(R-). Let R be the set of all such regions R. We will show that with 
probability at least 2/3, the estimates d + (R + ) + d-(R-) are accurate on all regions in R. 

Fix R = (-R+, R-.) £ R. Let T be the set of misclassified pixels in R, i.e., pixels in R + which are 
white in M and pixels in R- which are black in M. Define 7 = |r|/n 2 . Algorithm 2 approximates 
7 by d + (R+) + d-(R-) = FI S\. Equivalently, it uses the estimate n S\ for |T| (recall that 
p = -s/n 2 ). The error of the estimate is errs{R ) = ^|F n S\ — |T|. 

Claim 4.11. Pr[|err 5 (l?)| > ^7 • cen 2 ] < 2exp( —|c 2 e 2 s), where c = 1/21. 

Proof. For each pixel u, we define random variables \ u and X u , where Xu is the indicator random 
variable for the event u £ S (i.e., a Bernoulli variable with the probability parameter p), whereas 
X u = X ff ~ 1- Then our estimate of |F| is ||T n S\ = ± Jf uer Xu, whereas errs(R) = Y^uer x u- We 
use Bernstein inequality (Theorem 4.13) with parameters m = 771 2 and z = ^7 • cen 2 to bound 
P r E«er x u > y/l ' cen 2 ]. The variables X u are identically distributed. The maximum value of 
\ x u\ is a = ^2. Note that E[X 2 } = ^ E[(% u - p) 2 } = p Var[y u ] = ^ = a. We assume w.l.o.g. 
that z < |r|. (If z > |r| then X^uer Xu can never exceed z, and the probability we are bounding is 
0.) By Bernstein inequality, 


Pr 


^ ^ X u P Z 

.uer 


< 


( —z 2 /2 \ 

6X ^ \a|r| + a ■ z/3) 


< exp 




3 7 • c 2 e 2 n 4 s 
8 7 n 2 n 2 


The second inequality holds because a < 1/p and z < |T|. The equalities are obtained by 
substituting the expressions for z, |r|, and p, and simplifying. By symmetry, Pr[|errs(-R)| > z\ < 
2 exp(-|c 2 e 2 s). □ 

Claim 4.12. The number of regions in R is at most 50/eo 8 . 

Proof. Let k = 1/eo- There are four types of regions in R, each corresponding to a different call 
of the form d + (l? + ) + d-(R-) in the algorithm. The first type is a horizontal double strip of the 
form R + = 0 and !?_ = W( A) U We 2 . There are ( fc ^" 1 ) such strips. The second type is where R + 
is a black triangle A 60 & 1&2 ( or ^ 0 ^ 2 ) and R- is a vertical strip IFq (respectively, Wp 3 ). For 
each horizontal double strip, there are 2k — 1 vertical strips. For each of them, there are k ways 
to choose a reference point on the vertical line that delineates the strip. So, overall, there are 
(k + l)k(k — 1/2 )k regions of type 2. Type 1 and 2 together have at most .5 k s regions. Regions 
of type 3 are black quadrilaterals of the form R + = b^b'/Pb". Each quadrilateral is defined by two 
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reference lines, b'b ' 0 and b"b'^ with two reference points on each. There are (//) ways to choose 

reference directions for the two lines. For each of them, there are at most y/2k ■ ( v ^ fc ) ways to 
choose a reference line and two reference points. Overall, the number of quadrilaterals in R is at 
most ir 2 k 8 . Finally, regions of type 4 are contained in triangles of the form Au&q&q; they are of 
the form either R + = 0, i?_ = Au&q&q or R + = A66q6q, R- = Avv'v". In the former case, regions 
are defined by two line-point pairs (£(b' 0 ,v),b' 0 ) and (£(6 q, v), 6q). There are (^ fc ) pairs of reference 
directions. For each of them, there are at most y/2k choices for each reference line and y/2k choices 
for each reference points. In the latter case, they are defined by three reference line-point pairs: 
(£(b' 0 , v), 6q), (£(bQ, v), &o), and (£(v f , v"), b), but the direction of the line through v'v" is determined 
by 6 q, 6q. As before, there are (//) pairs of reference directions. For each of them, there are at most 
\/2k choices for each reference line and \f2h choices for each reference points. Overall, the number 
of regions of type 4 is upper-bounded by 47 t 2 /c 8 . Overall, |R| < (57 t 2 + .5 )k s < 50k s = 50/eo 8 , as 
claimed. □ 

By taking a union bound over all regions in R and applying Claims 4.11-4.12, we get that the 
probability that for one or more of them the error is larger than stated in Claim 4.11 is at most 
|R| • 2 exp(—|c 2 e 2 s) < ^ ■ exp(— |c 2 e 2 s) < 1/3, where the last inequality holds provided that 
s >C\ In j for some sufficiently large constant C. We get that 

Pr[|errs(i?)| < -y/q • cen 2 for all R G R] > 2/3. (1) 

Now suppose that event in (1) holds, that is, the error is low for all regions. Fix a reference 
polygon M 1 . Consider the partition of M' into regions from R = £ Ron which Algo¬ 

rithm 2 evaluates d + (R + ) + d-(R-) while implicitly computing d(M'). Let R m’ C R be the set 
of regions in the partition. Recall the four types of regions from the proof of Claim 4.12. Then 
R m' contains one region of type 1 and two regions of type 2, defined by the reference box of M’. 
Denote their areas by A\. A ’ 2 , A' :i . For each triangle T G T cnc j created during the construction of 
M' in Definition 4.4, the set R^ contains at most one region of type 3 and at most one region of 
type 4. They were implicitly colored, respectively, in Item 1 and Items 2-3 of Definition 4.4, when 
triangle T was processed. Let At and A' T denote their respective areas. 

Recall that A(R) denotes the area of R and that an approximate (but precise enough for 
asymptotic analysis) upper bound on the number of misclassihed pixels in R is A(R). Since the 
event in (1) holds, 

3 

errs(M') < ^ errs{R ) < cen ^ \f A(R) < cen( £/^+ E (\^4 t + 
fie R m / fieR M / j =i fieT end 

Since ^ 71,2 an d At+A’ t < A(T) for all T £ T en d, by concavity of the square root function, 

3 

E 

3 = 1 

We substitute these expressions in the previous inequality, use Lemma 4.8 and recall that c = 1/21: 
errs(M') < cen(V3n + y/2 E V^(T)) < cen 2 (V 3 + ll\/2) < ^en 2 . 

fieT end 
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This holds for all reference polygons M' as long as the event in (1) happens, i.e., with probability 
at least 2/3. This completes the proof of Lemma 4.10. □ 


For completeness, we state Bernstein’s inequality, which was used in the proof of Lemma 4.10. 

Theorem 4.13 (Bernstein’s inequality). Let Xi,.... X m be m independent zero-mean random 
variables, where |A/| < a for all i 6 [m\. Then for all positive z, 


m 

Pr ^ Xi > z < exp 
. 1 — 1 


(_ XI _'i 


4.3 Wrapping up: proof of Theorem 4.1 and corollary on agnostic learning 

Analysis of Algorithm 2. Let dM be the distance of M to convexity. Then there exists a 
convex image M* such that dist{M , M *) = dM • By Lemma 4.2, there is a reference polygon M such 
that dist(M*,M ) < e/6, and consequently, dM < dist(M,M ) < dM + e/6. By Lemma 4.10, with 
probability at least 2/3 over the choice of the samples taken by Algorithm 2, \d(M')—dist(M, M')\ < 
5e/6 for all reference polygons M'. Suppose this event happened. Then d > dM — 5e/6 because 
dist(M, M') > dM for all convex images M'. Moreover, d(M) < dist(M, M) + 5e/6 < dM + e. 
Thus, dM — 5e/6 < d < d(M) < dM + e. That is, |dM — d\ < e with probability at least 2/3, as 
required. 


Sample and time complexity of Algorithm 2. The number of samples taken by the 
algorithm is s = 0(e -2 log e -1 ). 

Next we explain how to implement it to run in time 0(e -8 ). Refer to Figure 4.3. Each instance 
triangle A b'b"v of the dynamic programming in subroutine Best is specified by two line-point pairs: 
(f(6', v ), b'), (£(b",v),b"). The number of line-point pairs is 0(e“ 3 ) because for each we select the 
reference direction, the shift of the line, and the reference point, each in 0(e _1 ) ways. Hence, we 
have 0(e~ 6 ) entries in the dynamic programming table for Best. 

In the process of solving an instance of Best, we consider 0(e -2 ) possibilities for points 6 q, t»o, 
that is, 0(e -8 ) possibilities over all instances. We show how to evaluate each of the possibilities 
in amortized time 0(1). For that, we count white and black sample pixels in each sub-area in 
Figure 4.3 in amortized time 0(1). 

First, we show how to do it for the entire triangle A b'b"v. We have 0(e~ 6 ) triangles that can 
be partitioned into 0(e~ 5 ) groups by specifying the first line-point pair (£(b',v),b') and the second 
line (through b" and v ). That is, within each group, we vary only point b" on the second line. 
We sort all sample points p e S according to the angle of the segment pb'. Similarly, we sort the 
reference points b" on the second line according to the angle of the segment b"b’. After sorting, 
a single scan can establish the counts of white and black pixels in the triangles. Clearly, we can 
sort in time o(e~ 3 ). Thus, we compute white/black counts for all instance triangles of Best in time 
o(e- 8 ). 

When we consider a possibility in Best, the triangle A6 q6qU is also an instance triangle, so we 
can find the white/black counts for the quadrilateral b'b' 0 bQb" by computing the difference between 
the counts for entire triangle Ab'b”v and triangle Ab^b'^v, that is, in time 0(1). 

When we consider a possibility in subroutine Best For Fixed Base, we need the counts for the four 
parts of A6 q&qU. Since we already calculated the counts for Ab^b^v and because we can perform 
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subtractions, it is enough to do it for three parts. Two of them, Ab' 0 bv' and Ab'^bv" , are instance 
triangles for Best, so we already calculated their counts. The third we choose is the triangle Av'v"v. 
Note that this triangle is specified by three reference lines, so there are 0(e -6 ) such triangles. We 
make a table for all of them. To fill the table, we consider 0(e -5 ) groups: we group together 
triangles for which line i has a common direction. By sorting samples in S , we can compute the 
counts for each group in time o(e~ 3 ). Thus, the time for filling the second table is o(e -8 ). To 
summarize, Algorithm 2 runs in time 0(e~ 8 ). This completes the proof of Theorem 4.1. □ 

Corollary 4.14. The class of convex images is properly agnostically PAC-learnable with sample 
complexity 0(\ log and time complexity O(jg) under the uniform distribution. 

Proof. We can modify Algorithm 2 to output, along with d, a reference polygon M with d(M) = d. 
With an additional DP table, we can compute which points became its vertices. By the analysis of 
Algorithm 2, with probability at least 2/3, the output M satisfies dist(M, M) < d\i + e. □ 


5 Distance Approximation to the Nearest Connected Image 

To define connectedness , we consider the image graph Gm of an image M. The vertices of Gm are 
{(i,j) | M[i,j] = 1}, and two vertices (i,j) and are connected by an edge if \i—i'\ + \j—j'\ = 1. 

In other words, the image graph consists of black pixels connected by the grid lines. The image is 
connected if its image graph is connected. 

Theorem 5.1. There is a block-uniform e-additive distance approximation algorithm for connect¬ 
edness with sample complexity O(^) and running time exp (O Q)). 

5.1 Border Connectedness 

The first idea in our algorithms for connectedness is that we can modify an image in a relatively 
few places by superimposing a grid on it (as shown in Figures 5.1 and 5.2), and as a result obtain 
an image whose distance to connectedness is determined by the properties of individual squares 
into which the grid lines partition the image. The squares and the relevant property of the squares 
are defined next. 

For a set S C [0..n) 2 and (i, j) G [0..n) 2 , we define S + (i, j) = {(x + i, y + j) : (x,y) £ S}. 

Definition 5.1 (Squares and grid pixels). Fix a side length n = 1 (mod r). For all integers 
i,j £ [0..n — r), where i and j are divisible by r, the (r — 1) x (r — 1) image that consists of all 
pixels in [r — l] 2 + ( i,j ) is called an r-square of M. The set of all r-squares of M is denoted S r . 

The pixels that do not lie in any squares of S r , i.e., pixels ( i,j) where i or j is divisible by r, 
are called grid pixels. The set of all grid pixels is denoted by GP r . 

Claim 5.2. | GP r \ < 2n 2 /r. 

Proof. |GP r | = 2(^1 + 1 )n - (^ + l) 2 < 2n 2 /r. □ 

Note that a square consists of pixels of an r-block, with the pixels of the first row and column 
removed. Therefore, a block-uniform algorithm can obtain a uniformly random r-square. 

Recall the definition of the border of an image from Section 2. 
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Figure 5.1: An image M. 
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Figure 5.2: A gridded image obtained 
from M. 


Definition 5.2 (Border connectedness). A (sub)image S is border-connected if for every black 
pixel ( i,j) of S, the image graph Gs contains a path from ( i,j) to a pixel on the border. The 
property border connectedness, denoted C, is the set of all border-connected images. 

5.2 Proof of Theorem 5.1 

The main idea behind Algorithm 5, used to prove Theorem 5.1, is to relate the distance to connect¬ 
edness to the distance to another property, which we call grid connectedness. The latter distance 
is the average over squares of the distances of these squares to border connectedness. The average 
can be easily estimated by looking at a sample of the squares. 

W.l.o.g. assume that n = 1 (mod 4/e). (Otherwise, we can pad the image with white pixels 
without changing whether it is connected and adjust the accuracy parameter.) 


Algorithm 5: Distance approximation to connectedness, 
input : n E N and e E (0,1/4); block-sample access to an n x n binary matrix M. 

1 Sample s = 4/e 2 squares uniformly and independently from S±/ e (see Definition 5.1). 

// This can be done by drawing random blocks from the 4/e-partition of [0..n) 2 . 

2 For each such square S, compute dist(S,C') (see Section 5.3 for details), where C' is border 
connectedness (see Definition 5.2). Let Squares be the average of computed distances 
dist(S, C'). 

3 return d= ((1 - f)(l - ^)) 2 ^squares* 


Definition 5.3. Fix e E (0,1/4). Let image M e be a gridded image obtained from image M as 
follows: 


M e [i,j} 


1 If (h j) is a 9 r id pixel from GP±/ e ] 

M[i,j] otherwise. 
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Let C be the set of all connected images. For e £ (0,1/4), define grid connectedness C e = {M \ 
M £ C, and M[i,j] = 1 for all ( i,j ) £ GP 4 / e }. 

Lemma 5.3. Let dM = dist(M,C ) and d e = dist(M e ,C e ). Then dM — § < d e < c?m- Moreover, 

7 5eS 4 / e 

Proof. First, we prove that d e < dM- Let be a connected image such that dist(M, M') = dM- 
Then M', the gridded image obtained from M', satisfies C e . Since dist(M e ,M' e ) < dM, it follows 
that d e < dM- 

Now we show that dM — § < d e . Let M'J £ C e be such that dist(M e , M'J) = d e . Then M'J £ C 
and, by Claim 5.2, dist(M,M' e ') < |GP 4 / e |/n 2 + d e < e/2 + d € , implying dM < e/2 + d e , as required. 

Finally, observe that to make M e satisfy C e , it is necessary and sufficient to ensure that each 
square satisfies C. In other words, 

d e n 2 = Dist(S,C') = (4/e- l) 2 Y dist(S,C'). 

SeS 4/e SeS 4 / £ 

Since |5 4 / 6 | = () 2 , the desired expression for d e follows. □ 

Analysis of Algorithm 5. Let Squares = |g| | Sggg 4/ dist(S,C'). Recall that Squares is 
the empirical average computed by the algorithm. By the Chernoff-Hoeffding bound, Pr[|d squares — 
^squares| > e/2] < 2exp(—2e 2 s) < 1/3. So, with probability at least 2/3, we have |d S quares — 
rigquares| < e/2. If this event happens then |d — d e \ < e/2 because by Algorithm 5 and Lemma 5.3, 
respectively, d = A- d squares and d e = A- d squa res , where A = ((1 - |)(1 - L)) < 1. By Lemma 5.3, 
| d e — e| < e/2. Thus, \d — e| < | d — d e \ + \d e — e| < e/2 + e/2 = e holds with probability at least 2/3, 
as required. 

Query and time complexity. Algorithm 5 samples 0(l/e 2 ) squares containing 0(l/e 2 ) 
pixels each. Thus, the sample complexity is 0(l/e 4 ). 

The most expensive step in Algorithm 5 is Step 2 where the distance of a square S to border 
connectedness is calculated. By Theorem 5.4 (see section 5.3), the running time of this step for one 
square is exp (O (-)) and it is called 0(l/e 2 ) times. Therefore, the running time of Algorithm 5 is 
exp(0(i)) , as claimed. 

5.3 Algorithm for Border Connectedness 

Theorem 5.4. Let S be a k x k image. There is an algorithm that computes dist(S,C ') (i.e., 
distance of S to border connectedness) in time exp (0(k)). 

Proof. To prove the theorem we give a dynamic programming algorithm that computes dist(S, C) 
in the following way: starting from row 1 of S, it processes a row and proceeds to the next one. 
The algorithm stops after processing row k. The information the algorithm computes and stores 
for each row is explained later in this section. 


25 



Definition 5.4. Let cl £ {0, l} k be a vector. Call maximal consecutive runs of 1 ’s in cl 1-blocks 
and let n(cl ) denote the number of 1-blocks in cl. Let 1* (respectively, O') denote the string oft 
ones (respectively, zeros) and let E = {0,1, <, x, >}. 

Consider a k x k image S. Recall that Gs denotes the image graph of S. For every i £ [k ], 
denote the subgraph of Gs , induced by the first i rows in S , by G l s . Index 1-blocks in row i of S in 
the increasing order of indices of pixels they contain. For example, a row 001110011 contains two 
1-blocks; the 1-block with three l’s has index 1, the 1-block with two l’s has index 2. Each 1-block 
in row i has one of the following 5 statuses w.r.t. G l s : 

• connected to the border of S (denoted by 1); 

• isolated, i.e., not connected to the border and to any other 1-block in its row (denoted by 0); 

• first 1-block in its connected component, i.e., it is in the connected component with other 
1-blocks of row i and has the smallest index among them (denoted by <); 

• intermediate 1-block in its connected component, i.e., has neither largest nor smallest index 
in its connected component (denoted by x); 

• last 1-block in its connected component, i.e., it is in the connected component with other 
1-blocks of row i and has the largest index among them (denoted by >). 


Definition 5.5. Let cl denote the coloring of S in row i, for some i £ [ k] (i.e., cl = £[?'],). Statuses 
of 1-blocks of cl w.r.t. G' s are captured by a status vector st £ S n ( c 0. The pair ( cl,st ) is called the 
configuration w.r.t. G' s . 

Definition 5.6. For all i £ [k\, cl £ {0, l} k , and vectors st over E of length at most k, define 
B(i,cl,st) = {S' | S' is a k x k border-connected image with configuration ( cl,st) w.r.t. G l s ,}. 

For all i £ [k] and k x k images S', let cosfifS, S') denote the number of pixels on which the 
first i rows of S and S' differ. For all i £ [k], cl £ {0, \} k , and st £ E fc define 


cost(i, col, st) 


min S'eB(i,d,Tt)( cost i( S i S ')) 

oo 


if B(i, col, st) 0, 
otherwise. 


Note that the number of all possible configurations for a row is at most 2 k ■ 5 k = exp (k). We show 
that if for some i £ [k — 1], the value of cost(i, cl, st) is known for every configuration (cl, st) in row 
i, then for every configuration (cl', st') in row i + 1, the cost cost(i + 1, cl', st') can be computed in 
time exponential in k. This is a crucial ingredient that helps us to show that the running time of 
our algorithm is exponential in k. 

For a fixed i £ [k — 1], consider an image S' £ B(i,cl,st). Let S" be an image which has the 
same j £ [i] rows as S' . Let cl' denote the coloring of row i +1 in S" . If S" is border-connected then 
configuration (cl,st) is consistent with coloring cl 1 , i.e., every 1-block in cl that has status other 
than 1 w.r.t. G l s „ is connected to a 1-block in cl' w.r.t. G 1 ^,} . Moreover, for some status vector 
st' £ E n ( d '), image S" £ B(i+ 1, cl', st') and st' can be determined from cl, cl' , and st. Observe that 
if cost(i,cl, st) is known for every configuration (cl,st), then cost(i + 1 ,cl',st') can be computed. 
After computing costs for all configurations in each row, dist(S,C ) = min^ sf costfk, cl, st) can 
be found. In order to find st', our algorithm uses subroutine ComputeStatus. ComputeStatus uses 
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subroutine ConstructGraph that creates a graph whose nodes are 1-blocks of cl and edges are defined 
according to the information provided by st. Now we explain how st' is found. 

Now we show how subroutine ComputeStatus computes st' from colorings col,col' , and the status 
vector st G E num ( co1 ) of col w.r.t G\ , where / is an image from B(i,col, st) for i E [k — 1], Let 
n\ = num(col) and n 2 = num(col'). Index 1-blocks in col in the nondecreasing order of indices of 
pixels they contain. Let I' be an image obtained from / by recoloring its row i + 1 to col' . Index 
1-blocks in col' in the nondecreasing order of indices of pixels they contain and add n\ to each 
index. Consider graph G = (V, E) where V = \ri\ + 712 ] and E has every edge of the following two 
types: 

1. edges where i,j E [ni], i < j, and i is not connected to any j' < j in G\,. 

2. (i,ni + j), where i G [n\],j G [ 712 ], and i is connected to n\ + j in Gjt 1 . 

ComputeStatus uses subroutine ConstructGraph to construct graph G. (ConstructGraph computes 
set E.) After graph G is constructed, ComputeStatus checks whether configuration (col, st) and col' 
are consistent w.r.t. G 1 ^ 1 . If they are not consistent it outputs a _L symbol. If they are consistent 
then B(i + 1, col', st') 7 ^ 0 (rows z + 1,..., k in I' can be recolored to all black rows and the resulting 
image is in B(i + 1, col', st')). ComputeStatus finds vector st' based on the connectivity information 
of graph G and information provided by vector st. 

To check whether cl' is consistent with (col,st) our algorithm uses subroutine ComputeStatus 
(Algorithm 8 ). If it is consistent, to find the status vector st'. Subroutine ComputeStatus uses 
subroutine ConstructGraph that constructs a graph from cl and st that helps to compute st'. The 
nodes in this graph are 1 -blocks of cl and edges are defined according to the information provided 
by st. ConstructGraph is explained next. 

For each i G [k], every image S has some configuration (cl, st) in its i’th row w.r.t. G l s . Vector 
st in this configuration has the information about which 1 -blocks in cl are in the same connected 
component in G l s and which are connected to the border. Thus, if (cl, st) is given we can construct 
a graph G whose nodes are all 1-blocks of cl and the status vector of cl w.r.t. G is st. Subroutine 
ConstructGraph (Algorithm 6 ) constructs graph G. 


Algorithm 6: Subroutine ConstructGraph used in Algorithm 8. 

input : vector cl G {0, l} fc , and st G E n ( c 0. 

1 Index 1-blocks in cl in the nondecreasing order of indices of pixels they contain. 
// Let n 1 = num(cl),V = [n\],E = 0, and stack = 0 (we maintain a stack) 

2 forall the indices j = 1,2,..., tt-i do 

3 if stack 7 ^ 0 and st\j] = 1 then return _L 

4 if stack = 0 and st[j] G { X , >} then return _L 

5 if st[j } G {<, x} then push(j) 

6 if st[j] = > then do p = pop(stack); add {p,j} to E until p = < 

7 return G = (V, E) 


Analysis of Algorithm 7. The following lemma shows that Algorithm 7 is correct. 

Lemma 5.5. For alii G [k], col G {0, l} k , and st G E k , Algorithm 7 correctly computes cost (i, col, st). 
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Algorithm 7: Distance to border connectedness of a square S. 
input : access to a k x k square S. 

1 forall the indices i £ [k], vectors col £ {0, l} fc , and st £ E num ( co 0 do 

// For i £ [fc], let fi £ {0, l} fc be a vector that corresponds to the i th row of S. 

, — f In - coZli, if i = 1 and it = 

2 cost{i,col,st) = < 1 

I oo, otherwise. 

3 forall the indices i = 2,3,..., k, vectors col, col' £ {0, l} k and st £ £ num ( co 0 do 

4 if cost(i — 1, col, st) / oo then st' = ComputeStatus(i, co^, st, col') 

5 if st' /_L then costfi, col', st') = min{cosf(i, col', st'), cost(i — 1, col, st) + \fi — col' 1 1 } 

6 return (min^ g{0 1}k ^ gErmm( ^ cost(k,col,st)) ■ k~ 2 


Proof. We prove the lemma inductively. For the first row, Algorithm 7 indeed computes the cost 
of every configuration. (Note that every 1-block in the first row is connected to the border and 
thus, every such 1-block has status 1. Therefore, cost(l,col, st) = |ri — col\\ if st = l num ( co1 ), 
and cost(l,col, st) = oo, otherwise). Assume that the statement in the lemma holds for some 
row i £ [k — 1]. We prove the statement for row i + 1. Note that if B(i + 1 ,col',st') = 0 for 
some col', st', then cost(i + 1, col', st') = oo. The algorithm correctly sets costfi + 1, col', st') to oo 
and never changes it. If B(i + 1 ,col',st') / 0 consider an image I* £ B(i + l,col',st') such that 
cost(i + 1 ,col',st') = costi + \(S, I*). Let col be the coloring of row i in I* and st be the status 
vector of col w.r.t. G}*. Then col' and the configuration ( col,st ) are consistent w.r.t. G\f 1 . Note 
that I* £ B(i,col, st) and costj + i(S, I*) = costi(S, I*) + \ri + i — col'\\. Moreover, for every image 
h £ B(i,col, st), there is an image I 2 £ B{i + 1 ,col',st') such that costi{S,I\) = costi(S , I 2 ) ■ (In 
I 2 , recolor row i + 1 to col' and all rows i + 2,..., k to all black rows and obtain image I 2 ■ In image 
I 2 , col' and (col, st) are consistent w.r.t. G]^ 1 and every 1-block in its rows i + 1,..., k is border- 
connected. Thus, image I 2 is border-connected and I 2 £ B(i+ 1, col', st').) Therefore, costi(S, I*) = 
rniiifg^. sf , costi(S, I) = cost(i, col, st). At some point, the algorithm considers the configuration 
(col,st) for row i and the coloring col' for row i + 1. By the inductive assumption, the algorithm 
correctly computes cost(i, col, st) which is equal to costi(S, I*). The output of ComputeStatus for 
the triple i , col, st will be the vector st' and the algorithm sets cost(i + 1, col', st') to cost(i, col, st) + 
\ri + \ — col' |i = costi(S, I*) + \ri + \ — col'\\ which is the correct value. This completes the proof. □ 

By Lemma 5.5, Algorithm 7 computes the cost of every configuration in row k. The algorithm 
outputs the minimum one among these costs. Let S be an image such that dist(S,C) = dist(S, S). 
Note that configurations of row k that are not possible for a border-connected image have un¬ 
bounded costs (i.e., 00 ). Thus, row k in S has some configuration which has the minimum cost 
among all configuration costs for the row. Note that the cost of a configuration in row k is equal to 
the cost of recoloring of S to some border connected square. Therefore, the output of Algorithm 7 
is equal to dist(S,C). 

Query and Time Complexity of Algorithm 7. The most expensive step in Algorithm 7 
is Step 3. Note that there are at most k • 2 k ■ 5 k sets B(-, ■, ■) for which subroutine ComputeStatus is 
called in this step. ComputeStatus uses subroutine ConstructGraph to construct graph G = (V,E). 
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Algorithm 8: Subroutine ComputeStatus used in Algorithm 7. 

input : index i; vectors col , col' G {0, l} k , and st G E num ( coi ). 

1 Construct graph G = ConstructGraph(co£, col', st)// Let G = (V,E). 

2 if E = 0 then return _L 

// Let ni = num(col),ri 2 = num(col ') 

3 Let coZ[0] = co/'[0] = rotti = row^ = 0 

4 For every j = 1.2...., //. j 

5 if col[j — 1] = 0 and col[j] = 1 then increment row i by 1 

6 if co/'[j — 1] = 0 and col'[j} = 1 then increment row 2 by 1 

7 if coZ[j] • col'[j ] = 1 and {rowi, rcra^} ^ -E then add {row\, row 2 } to E 

8 if 3j G [m] with st[j] 7 ^ 1 such that (j, m + f ) ^ i? for all / G [ 77 - 2 ] then return _L 

9 if i = k then st' = l n2 

10 else 

11 Let st' = O™ 2 . Update st'[l] = col'[ 1] and st'[n 2 ] = co^'ffc] 

12 For each edge (j, n\ + f) G E, j G [ni], j' G [ 772 ], if st[j] = 1 then st[ni + /] = 1. 

13 Run BFS to find connected components in G. For each pair (m + j, ri\ + j'), where 
j. j' G [ 712 ], if j and j' are connected and st'[j] = 1 then sf [n 1 + j'] = 1. 

14 For each connected component of vertices ni + j not marked by 1, where j G [ 112 ], update 
the corresponding entries of st' with the corresponding symbols in £ (i.e., st'[j] = < if it is 
the vertex with the smallest index in the component, st'[j] = > if it is the vertex with the 
largest index in the component, and st'[j] = X, otherwise). 

15 return st' 
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To construct type 1 and type 2 edges in E subroutine ConstructGraph performs 0{n \) + 0{n\) = 
0(k ) operations. Thus, the running time of ConstructGraph is 0(k) (recall that n\ = num{col ), 
n 2 = num(col')). Therefore, Steps 1-5 of ComputeStatus run in time 0(k). Among the remaining 
steps (Steps 6-10) of ComputeStatus the most expensive ones are Steps 8 and 9. Each of them 
runs in time 0(ni • 71 - 2 ) = 0(k 2 ) time. Thus, the running time of ComputeStatus is 0(k 2 ) and 
Algorithm 7 runs in time 0(k 2 ■ k2 k 5 k ) = exp ( 0(k )), as claimed. □ 
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